Optimizing Speech Recognition Evaluation Using Stratified Sampling
نویسندگان
چکیده
Producing large enough quantities of high-quality transcriptions for accurate and reliable evaluation of an automatic speech recognition (ASR) system can be costly. It is therefore desirable to minimize the manual transcription work for producing metrics with an agreed precision. In this paper we demonstrate how to improve ASR evaluation precision using stratified sampling. We show that by altering the sampling, the deviations observed in the error metrics can be reduced by up to 30% compared to random sampling, or alternatively, the same precision can be obtained on about 30% smaller datasets. We compare different variants for conducting stratified sampling, including a novel sample allocation scheme tailored for word error rate. Experimental evidence is provided to assess the effect of different sampling schemes to evaluation precision.
منابع مشابه
A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملOptimizing Expected Word Error Rate via Sampling for Speech Recognition
State-level minimum Bayes risk (sMBR) training has become the de facto standard for sequence-level training of speech recognition acoustic models. It has an elegant formulation using the expectation semiring, and gives large improvements in word error rate (WER) over models trained solely using crossentropy (CE) or connectionist temporal classification (CTC). sMBR training optimizes the expecte...
متن کاملImproved spoken language translation using n-best speech recognition hypotheses
We intended to demonstrate the effect of using N -best speech recognition hypotheses for improving speech translation performance. A log-linear model, which integrated features from speech recognition and statistical machine translation, was used to rescore the translation candidates. Model parameters were estimated by optimizing an objectively measurable but subjectively relevant translation q...
متن کاملOptimizing Wavelet Parameters for Dereverberation in Automatic Speech Recognition
We present an optimization method of the wavelet parameters for dereverberation in automatic speech recognition (ASR). By tuning the wavelet parameters to improve the acoustic model likelihood, wavelet-based dereverberation methods become more effective in the ASR application. We evaluate several existing wavelet-based methods and optimize them, based on our proposed scheme. Experimental evalua...
متن کاملOptimization of Cost Function Weights for Unit Selection Speech Synthesis Using Speech Recognition
A well known problem in unit selection speech synthesis is designing the join and target function sub-costs and optimizing their corresponding weights so that they reflect the human listeners’ preferences. To achieve this we propose a procedure where an objective criterion for optimal speech unit selection is used. The objective criterion for tuning the cost function weights is based on automat...
متن کامل